Code and data files for this project are published on GitHub: cjunwon/STA-207

Initial report can be downloaded from here.

To skip to key conclusions from the initial analysis and new results, click here.

Original data source: Harvard Dataverse: Tennessee’s Student Teacher Achievement Ratio (STAR) project

1 Abstract

2 Introduction

3 Background & Motivation of Project STAR

Project STAR, short for the Student/Teacher Achievement Ratio Project, emerged in the mid-1980s as a pioneering effort to explore the relationship between class size and student academic outcomes. Motivated by policy concerns regarding the efficacy of smaller classes, the study was funded by the Tennessee General Assembly and implemented as a randomized controlled trial. In this experiment, students and teachers were randomly assigned to one of three classroom environments—small classes (13–17 students), regular classes (22–25 students), and regular classes with a teacher’s aide. This design was intended to isolate the effect of class size on academic performance while controlling for potential confounders, thereby providing strong evidence on the causal impacts of educational settings.

Beyond its initial focus on early childhood education, Project STAR was designed as a longitudinal study that followed students from kindergarten through third grade, and later into high school. This extended follow-up allowed researchers to investigate long-term outcomes, including high school achievement, graduation rates, and preparedness for higher education. The extensive dataset, which encompasses detailed academic records, teacher assessments, and demographic information, has been invaluable in shaping educational policy and research. By systematically analyzing the long-term effects of early educational interventions, Project STAR has contributed significantly to our understanding of how classroom environments influence academic trajectories and overall student success.

3.1 Dataset Details

library(dplyr)
library(ggplot2)
library(ggthemes)
library(knitr)
library(kableExtra)
library(patchwork)
library(car)
library(MASS)
library(broom)
library(AER)
library(foreign)
library(forcats)
library(tidyr)
library(ggalluvial)
library(plotly)
library(patchwork)

STAR_Students <- read.spss('dataverse_files/PROJECT STAR/STAR_Students.sav', to.data.frame=TRUE)
# Comparison_Students <- read.spss('dataverse_files/PROJECT STAR/Comparison_Students.sav', to.data.frame=TRUE)
STAR_K3_Schools <- read.spss('dataverse_files/PROJECT STAR/STAR_K-3_Schools.sav', to.data.frame=TRUE)
STAR_High_Schools <- read.spss('dataverse_files/PROJECT STAR/STAR_High_Schools.sav', to.data.frame=TRUE)

This investigation utilizes the STAR-and-Beyond database from the Harvard Dataverse, which contains detailed information on students, teachers, and schools involved in Project STAR. The dataset includes records from the original STAR study, as well as follow-up data from high school if available.

The primary student-level data file contains information on 11,601 students who participated in the experimental phase for at least one year between 1985 and 1989. Information for each of grades K-3 includes:

  • Demographic variables
  • School and class identifiers
  • School and teacher information
  • Experimental condition (“class type”)
  • Norm-referenced and criterion-referenced achievement test scores
  • Motivation and self-concept scores

As part of the extended follow-up, added to the records of some or all students, include:

  • Achievement test scores for the students when they were in grades 4 – 8, obtained from the Tennessee State Department of Education
  • Teachers’ ratings of student behavior in grades 4 and 8
  • Students’ self-reports of school engagement and peer effects in grade 8
  • Course taking in mathematics, science, and foreign language in high school, obtained from student transcripts
  • SAT/ACT participation and scores, obtained from ACT, Inc. and from Educational Testing Service
  • Graduation/dropout information, obtained from high school transcripts and the Tennessee State Department of Education

Note: This investigation does not necessarily encompass all variables in the dataset, but rather focuses on key areas of interest related to class size and student achievement (discussed in the subsequent sections).

4 Experimental design of Project STAR

Project STAR was initiated following the passage of House Bill (HB) 544 by the Tennessee Legislature in May 1985, aimed at investigating the effects of class size on student achievement and development in primary grades (K–3). The legislation outlined three primary research questions:

  1. What are the effects of reduced class sizes on student achievement (as measured by normed and criterion-referenced tests) and development outcomes such as self-concept and attendance?
  2. Are there cumulative advantages of remaining in smaller classes over multiple years compared to shorter-term exposure?
  3. Does specialized teacher training enhance student outcomes in reduced-size classes or classes assisted by teacher aides, compared to untrained teachers?

To implement this study, the Tennessee State Department of Education established a research consortium involving representatives from the Department, State Board of Education, State Superintendents’ Association, and four Tennessee universities. The study adhered to an experimental design, randomly assigning students entering kindergarten in 1985 or first grade in 1986 to one of three class conditions:

  • Small class (S): 13–17 students
  • Regular class (R): 22–25 students
  • Regular class with a full-time teacher aide (RA): 22–25 students

Randomization was executed by consortium members and supervised locally by university-affiliated graduate students, ensuring unbiased assignment based on gender, race, and socioeconomic status.

4.1 Selection of Schools

All Tennessee schools were invited to participate under conditions set by the state, including the random assignment requirement, maintenance of standard school policies aside from class size adjustments, and commitment for four consecutive years. Of the initially interested 180 schools, 79 were ultimately selected from 42 districts to ensure representation of inner-city, suburban, urban, and rural settings:

  • 17 inner-city schools
  • 16 suburban schools
  • 8 urban schools
  • 38 rural schools

Participation fluctuated slightly due to mergers and withdrawals, primarily attributed to challenges maintaining randomization and administrative burdens. Consequently, the number of participating schools ranged from 79 in kindergarten to 75 by third grade.

4.2 Operational Adjustments

After the initial year, STAR administrators modified the study slightly by randomly redistributing half of the students between regular (R) and regular-aide (RA) classes for subsequent years due to no significant kindergarten performance differences found between these two groups. Small-class assignments remained unchanged. This is a caveat that is addressed in the subsequent analysis.

Teacher training occurred for a subset of second-grade teachers, with no significant difference in student achievement outcomes observed between trained and untrained teachers. Student mobility also influenced class composition, with new entrants randomly assigned while maintaining small-class constraints. This “class size drift” was documented and considered in subsequent analyses.

4.3 Data Collection and Measures

Academic performance was evaluated annually using the Stanford Achievement Tests (SATs) and the Tennessee Basic Skills First (BSF) tests. Student self-concept and motivation were measured using the SCAMIN inventory. Beyond third grade, additional longitudinal data were collected, including academic performance in grades 4–8 (via the Tennessee Comprehensive Assessment Program, TCAP), student participation and identification with school surveys, college entrance examination data (ACT/SAT), high school transcripts, and graduation/dropout information.

These detailed design features and rigorous methodologies positioned Project STAR as a landmark experimental study capable of robustly determining the causal impacts of class size on educational outcomes. However, the project did not come without limitations and challenges.

5 Shortcomings of Project STAR

Despite its robust experimental design, Project STAR has several notable limitations that should be acknowledged when interpreting its findings:

5.1 Attrition and Mobility Effects

Project STAR experienced considerable student mobility, resulting in many students not remaining in their assigned class types throughout the study period. Such mobility led to a phenomenon known as “class size drift,” where the actual sizes of regular classes sometimes became similar to those of small classes, potentially diluting the experimental contrast and complicating causal inference.

5.2 Generalizability Concerns

The purposeful selection of schools, which aimed to cover diverse geographic and socioeconomic areas within Tennessee, might limit the external validity of the findings. Specifically, Project STAR schools were slightly larger and had slightly lower initial achievement scores compared to statewide averages, raising questions about how representative the findings are for other educational contexts.

5.3 Teacher Training and Implementation Fidelity

The project provided only limited teacher training, which did not specifically equip teachers to leverage smaller class sizes effectively. Additionally, training was not uniformly administered, and there was no demonstrated impact of the training itself. Thus, differences in instructional quality or consistency across classes might have influenced outcomes, independent of class size.

5.4 Short-term vs. Long-term Effects

Although the study was longitudinal, it only maintained controlled class-size conditions through grade three, after which students returned to standard-sized classes. The analysis of longer-term effects beyond third grade thus faces challenges in isolating the direct impact of early exposure to small classes from subsequent educational experiences.

5.5 Limited Control of Classroom Dynamics

Aside from controlling for class size and the presence of aides, the study deliberately maintained “normal” school operations. This approach meant that other important classroom variables, such as teaching methods, curriculum variations, and peer dynamics, remained uncontrolled, potentially confounding the observed effects.

6 Key Conclusions from the Initial Analysis Report

In the initial analysis of Project STAR data, we were primarily interested in answering the following two questions:

Primary question: Are there any differences in math scaled scores in 1st grade across class types?

Secondary question: If there are differences, which class type is associated with the highest math scaled scores in 1st grade?

To answer these questions, we adopted the following two-way ANOVA model with the following structure:

\[Y_{ijk} = \mu_{..} + \alpha_{i} + \beta_{j} + \epsilon_{ijk}\] where the index \(i\) represents the class type: small (\(i=1\)), regular (\(i=2\)), regular with aide (\(i=3\)), and the index \(j\) represents the school indicator. The rest of the parameters are as follows:

  • \(Y_{ijk}\): Mean math score of student \(k\) in class \(i\) at school \(j\).
  • \(\mu_{..}\): Overall mean math score
  • \(\alpha_{i}\): Main effect of class type \(i\) with constraint \(\sum_{i=1} \alpha_{i} = 0\)
  • \(\beta_{j}\): Main effect of school \(j\) with constraint \(\sum_{j=1} \beta_{j} = 0\)
  • \(\epsilon_{ijk}\): Random error in the math score of student \(k\) in class \(i\) at school \(j\)

The assumptions of the two-way ANOVA model are as follows:

  • Independence: The residuals \(\epsilon_{ijk}\) are independent of each other
  • Normality: The residuals \(\epsilon_{ijk}\) are normally distributed
  • Homoscedasticity: The variance of the residuals \(\epsilon_{ijk}\) is constant across all levels of the independent variables

We answered the primary question of interest by conducting an F-test to determine if there are significant differences in math scaled scores across class types. The null and alternative hypotheses were as follows:

  • \(H_0: \alpha_1 = \alpha_2 = \alpha_3 = 0\) (No significant differences in math scaled scores across class types)
  • \(H_A: \text{At least one } \alpha_i \neq 0\) (Significant differences in math scaled scores across class types)

Assumptions for the F-test include the normality of residuals and homoscedasticity, which remain the same as the two-way ANOVA model.

The F-test results indicated that the p-value for class type (star1) is less than 0.05, suggesting that there are significant differences in math scaled scores across class types. We rejected the null hypothesis and concluded that at least one class type has a significantly different mean math score compared to the others.

We implemented the Tukey HSD test to find that students in small classes have significantly higher math scores compared to students in regular classes and regular classes with an aide. However, there was no significant difference between regular classes and regular classes with an aide.

7 Caveats of Initial Analysis Report

While the initial analysis report provided some valuable insights into the short-term effects of class size on student math scores in 1st grade, several caveats and limitations should be considered:

  1. Limited Focus on Math Scaled Scores: The analysis primarily focused on math scaled scores as the outcome variable, neglecting other subjects or measures of student achievement. This narrow focus might not capture the full spectrum of educational outcomes influenced by class size. Utilizing scores from other subjects or broader achievement metrics could provide a more comprehensive understanding of the impact of class size on student learning. It would also allow for a more comprehensive comparison into the long-term effects of class size on student achievement.

  2. Short-term Analysis: The initial analysis only considered the math scores of 1st-grade students, providing a snapshot of the immediate effects of class size on academic performance. While this short-term perspective is valuable, it fails to capture the long-term implications of early educational experiences. A more comprehensive analysis that tracks student outcomes over multiple grades and years would offer a more nuanced understanding of how class size influences academic trajectories. This would require a longitudinal approach that follows students beyond the early grades and into high school and beyond.

  3. Operational Adjustments Post-1st Grade: The initial analysis did not account for the operational adjustments made after the first year of the study, such as the redistribution of students between regular and regular-aide classes. While most students who were designated to small classes continued in that setting, students in regular and regular-aide classes were randomly reassigned. In an update in 1999, it was reported that class size and pupil teacher ratios (PTR) are not the same, and that PTR does not influence student outcomes. Therefore for further analysis, it would be more efficient and accurate to consider the class sizes as either small or regular (with and without aide together).

data_alluvium <- subset(STAR_Students, select = c(gkclasstype, g1classtype, g2classtype, g3classtype))

class_levels <- c("small", "regular", "regular-aide")
data_alluvium <- data_alluvium %>%
  mutate(across(everything(), ~ factor(as.numeric(.),
                                       levels = c(1, 2, 3),
                                       labels = class_levels))) %>%
  mutate(across(everything(), ~ fct_explicit_na(., na_level = "Unknown (NA)")))

grade_mapping <- c("gkclasstype" = "Kindergarten",
                   "g1classtype" = "Grade 1",
                   "g2classtype" = "Grade 2",
                   "g3classtype" = "Grade 3")

pairs <- list(c("gkclasstype", "g1classtype"),
              c("g1classtype", "g2classtype"),
              c("g2classtype", "g3classtype"))

flow_list <- lapply(pairs, function(x) {
  data_alluvium %>%
    group_by(across(all_of(x))) %>%
    summarise(value = n(), .groups = "drop") %>%
    mutate(
      source = paste(grade_mapping[x[1]], ": ", .[[x[1]]], sep = ""),
      target = paste(grade_mapping[x[2]], ": ", .[[x[2]]], sep = "")
    ) %>%
    select(source, target, value)
})

transitions <- bind_rows(flow_list)

nodes <- unique(c(transitions$source, transitions$target))
nodes_df <- data.frame(name = nodes, stringsAsFactors = FALSE)
nodes_df$index <- seq_len(nrow(nodes_df)) - 1

transitions <- transitions %>%
  left_join(nodes_df, by = c("source" = "name")) %>%
  rename(source_index = index) %>%
  left_join(nodes_df, by = c("target" = "name")) %>%
  rename(target_index = index)


get_class <- function(x) sub(".*: ", "", x)

nodes_df$class_type <- sapply(nodes_df$name, get_class)


class_colors <- c(
  "small"        = "#F5C310",  # yellow
  "regular"      = "#0072B2",  # blue
  "regular-aide" = "#009E73",  # green
  "Unknown (NA)" = "#D55E00"   # reddish/orange
)

nodes_df$color <- class_colors[nodes_df$class_type]


transitions$link_color <- scales::alpha(nodes_df$color[ match(transitions$source, nodes_df$name) ], 0.4)



p <- plot_ly(
  type = "sankey",
  orientation = "h",

  arrangement = "snap",
  node = list(
    label = nodes_df$name,
    pad = 15,
    thickness = 20,
    line = list(color = "black", width = 0.5),
    color = nodes_df$color
  ),
  link = list(
    source = transitions$source_index,
    target = transitions$target_index,
    value = transitions$value,
    color = transitions$link_color,
    opacity = 0.1
  )
) %>%
  layout(
    title = "Alluvial Diagram of Students' Transfer",
    font = list(size = 10),
    width = 800
  )

p
data_alluvium <- STAR_Students %>%
  select(gkclasstype, g1classtype, g2classtype, g3classtype) %>%
  drop_na() %>%
  mutate(across(everything(), ~ factor(as.numeric(.),
                                       levels = c(1, 2, 3),
                                       labels = c("small", "regular", "regular-aide"))))

grade_mapping <- c("gkclasstype" = "Kindergarten",
                   "g1classtype" = "Grade 1",
                   "g2classtype" = "Grade 2",
                   "g3classtype" = "Grade 3")


pairs <- list(c("gkclasstype", "g1classtype"),
              c("g1classtype", "g2classtype"),
              c("g2classtype", "g3classtype"))

flow_list <- lapply(pairs, function(x) {
  data_alluvium %>%
    group_by(across(all_of(x))) %>%
    summarise(value = n(), .groups = "drop") %>%
    mutate(
      source = paste(grade_mapping[x[1]], ": ", .[[x[1]]], sep = ""),
      target = paste(grade_mapping[x[2]], ": ", .[[x[2]]], sep = "")
    ) %>%
    select(source, target, value)
})


transitions <- bind_rows(flow_list)


nodes <- unique(c(transitions$source, transitions$target))
nodes_df <- data.frame(name = nodes, stringsAsFactors = FALSE)
nodes_df$index <- seq_len(nrow(nodes_df)) - 1


transitions <- transitions %>%
  left_join(nodes_df, by = c("source" = "name")) %>%
  rename(source_index = index) %>%
  left_join(nodes_df, by = c("target" = "name")) %>%
  rename(target_index = index)


get_class <- function(x) sub(".*: ", "", x)
nodes_df$class_type <- sapply(nodes_df$name, get_class)

class_colors <- c(
  "small"        = "#F5C310",  # yellow
  "regular"      = "#0072B2",  # blue
  "regular-aide" = "#009E73"   # green
)


nodes_df$color <- class_colors[nodes_df$class_type]


transitions$link_color <- scales::alpha(nodes_df$color[ match(transitions$source, nodes_df$name) ], 0.4)


p <- plot_ly(
  type = "sankey",
  orientation = "h",
  arrangement = "snap",
  node = list(
    label = nodes_df$name,
    pad = 15,
    thickness = 20,
    line = list(color = "black", width = 0.5),
    color = nodes_df$color
  ),
  link = list(
    source = transitions$source_index,
    target = transitions$target_index,
    value = transitions$value,
    color = transitions$link_color
  )
) %>%
  layout(
    title = "Alluvial Diagram of Students' Transfer (NA Removed)",
    font = list(size = 10),
    width = 800
  )

p

With these caveats in mind, we aim to extend the analysis of Project STAR data to explore the long-term effects of class size on student academic achievement. The natural new question of interest is:

Extended question: For students who complete both primary and secondary education with the objective of pursuing higher education (college), does the exposure to small class sizes in early education (K-3) have a significant impact on their high school academic performance and college readiness?

8 Answering the New Question of Interest by Addressing the Caveats

One more point to keep in mind is that Tennessee implemented a new student assessment system the year STAR students entered grade 4, the Tennessee Comprehensive Assessment Program (TCAP). The TCAP assessment battery included norm-referenced tests from the Comprehensive Tests of Basic Skills (CTBS/McGraw Hill, 1989) and BSF criterion-referenced tests for each grade in reading and mathematics. Scores on these tests were made available by the Tennessee State Department of Education, as students progressed from grade 4 (1989-1990) through grade 8 (1993-1994).

The user guide notes that “Scores on the CTBS are not directly comparable to those on the SATs. However, IRT scale scores were available for each CTBS subtest so that comparisons can be made meaningfully across grades 4—8.” Hence the scaled scores are valid for comparison across grades 4-8.

8.1 Subsetting the Data for Appropriate Analysis

The new question of interest requires a subset of students who have completed both primary and secondary education, and that we have complete data on their academic performance and college readiness. To achieve this, we will filter the dataset to include only students who have complete data using the flags in the data file the binary flag variables indicate participation/non-participation at each stage of data collection.

Flag Type Variable Name Description
In-STAR Flags flagsgk In STAR in kindergarten
flagsg1 In STAR in grade 1
flagsg2 In STAR in grade 2
flagsg3 In STAR in grade 3
Achievement-data Flags flaggk Achievement data available kindergarten
flagg1 Achievement data available grade 1
flagg2 Achievement data available grade 2
flagg3 Achievement data available grade 3
flagg4 Achievement data available grade 4
flagg5 Achievement data available grade 5
flagg6 Achievement data available grade 6
flagg7 Achievement data available grade 7
flagg8 Achievement data available grade 8
High School Data Flags flagsatact Valid SAT/ACT score available
flaghscourse At least two years of high school course data
flaghsgraduate Data on high school graduation status available

Our subset will require Achievement-data Flags and High School Data Flags to be “YES” for each student. This ensures that we have complete data on student achievement from kindergarten through grade 8 and high school graduation status. We should note that the question of interest is specific to students who have completed both primary and secondary education. Therefore any concerns for selection bias is beyond the scope of this analysis.

STAR_flag_vars <- c('FLAGSGK',
                    'FLAGSG1',
                    'FLAGSG2',
                    'FLAGSG3')

Achievement_flag_vars <- c('flaggk',
                           'flagg1',
                           'flagg2',
                           'flagg3',
                           'flagg4',
                           'flagg5',
                           'flagg6',
                           'flagg7',
                           'flagg8')

HS_College_flag_var <- c('flagsatact',
                         'flaghscourse',
                         'flaghsgraduate')

# Subset students who have Achievement_flag_vars and HS_College_flag_var columns == 'YES'
All_Grades_Students <- STAR_Students %>%
  filter(if_all(all_of(Achievement_flag_vars), ~ . == "YES") & 
         if_all(all_of(HS_College_flag_var), ~ . == "YES"))

8.1.1 Data Completeness and Proportion of Students with Complete Data for Longitudinal Analysis

Achievement_data_students <- STAR_Students %>%
  filter(if_all(all_of(Achievement_flag_vars), ~ . == "YES"))

HS_College_data_students <- STAR_Students %>%
  filter(if_all(all_of(HS_College_flag_var), ~ . == "YES"))

student_count <- nrow(STAR_Students)


complete_student_count1 <- nrow(Achievement_data_students)
incomplete_student_count1 <- student_count - complete_student_count1
counts1 <- c(complete_student_count1, incomplete_student_count1)
percent1 <- round(100 * counts1 / student_count, 1)
labels1 <- paste(c("Data Available", "Data \n Unavailable"), 
                 "\nCount:", counts1, 
                 "\n", percent1, "%")


complete_student_count2 <- nrow(HS_College_data_students)
incomplete_student_count2 <- student_count - complete_student_count2
counts2 <- c(complete_student_count2, incomplete_student_count2)
percent2 <- round(100 * counts2 / student_count, 1)
labels2 <- paste(c("Data Available", "Data \n Unavailable"), 
                 "\nCount:", counts2, 
                 "\n", percent2, "%")


complete_student_count3 <- nrow(All_Grades_Students)
incomplete_student_count3 <- student_count - complete_student_count3
counts3 <- c(complete_student_count3, incomplete_student_count3)
percent3 <- round(100 * counts3 / student_count, 1)
labels3 <- paste(c("Data Available", "Data \n Unavailable"), 
                 "\nCount:", counts3, 
                 "\n", percent3, "%")


par(mfrow = c(2, 2), mar = c(1, 4, 1, 4))

pie(counts1, labels = labels1, 
    main = "Achievement Data Completeness", col = c("lightblue", "lightgray"))
pie(counts2, labels = labels2, 
    main = "High School Data Completeness", col = c("lightgreen", "lightgray"))
pie(counts3, labels = labels3, 
    main = "Achievement & High School Completeness", col = c("lightcoral", "lightgray"))


par(mfrow = c(1, 1))

We notice that the proportion of students with complete data for both achievement and high school/college readiness is relatively low compared to the total number of students in the dataset. This highlights the challenges associated with longitudinal studies and the importance of data completeness for robust analyses. Despite the limitations, we are still left with 546 students who have complete data for the longitudinal analysis.

Any results or conclusions beyond this point is based on these 546 students who have complete data for the analysis.

8.1.2 Variables of Interest for Longitudinal Analysis

We keep the following variables for the longitudinal analysis:

Type Variable Name Description
Demographic Variables stdntid Student ID
Class Type & STAR Participation gkclasstype Class type in kindergarten
g1classtype Class type in grade 1
g2classtype Class type in grade 2
g3classtype Class type in grade 3
cmpstype Class type composite
cmpsdura Duration composite
yearsstar Number of years in STAR
Reading & Math Scores gktreadss Total reading scaled score SAT kindergarten
gktmathss Total math scaled score SAT kindergarten
g1treadss Total reading scale scores SAT Grade 1
g1tmathss Total math scale scores SAT Grade 1
g2treadss Total math scale scores SAT Grade 2
g2tmathss Total reading scale scores SAT Grade 2
g3treadss Total math scale scores SAT Grade 3
g3tmathss Total reading scale scores SAT Grade 3
g4treadss Total math scale scores CTBS Grade 4
g4tmathss Total reading scale scores CTBS Grade 4
g5treadss Total math scale scores CTBS Grade 5
g5tmathss Total reading scale scores CTBS Grade 5
g6treadss Total math scale scores CTBS Grade 6
g6tmathss Total reading scale scores CTBS Grade 6
g7treadss Total math scale scores CTBS Grade 7
g7tmathss Total reading scale scores CTBS Grade 7
g8treadss Total math scale scores CTBS Grade 8
g8tmathss Total reading scale scores CTBS Grade 8
High School Performance hsgpaoverall GPA overall high school
hsactcomp ACT composite score high school
hsactconverted SAT –> ACT (test score reported in ACT composite metric) high school
hsgrdcol High school graduation status
SAT_students <- STAR_Students %>%
  filter(hssat == "YES" & hsact == "NO")

ACT_students <- STAR_Students %>%
  filter(hsact == "YES" & hssat == "NO")

SAT_ACT_students <- STAR_Students %>%
  filter(hssat == "YES" & hsact == "YES")

no_SAT_ACT_students <- STAR_Students %>%
  filter(hssat == "NO" & hsact == "NO")

num_SAT_students <- nrow(SAT_students)
num_ACT_students <- nrow(ACT_students)
num_SAT_ACT_students <- nrow(SAT_ACT_students)
num_no_SAT_ACT_students <- nrow(no_SAT_ACT_students)


df <- data.frame(
  Category = c("SAT Only", "ACT Only", "Both SAT & ACT", "Neither SAT nor ACT"),
  Count = c(num_SAT_students, num_ACT_students, num_SAT_ACT_students, num_no_SAT_ACT_students)
)


df$Percentage <- round(df$Count / sum(df$Count) * 100, 1)


pie_chart4 <- plot_ly(
  data = df,
  labels = ~Category,
  values = ~Percentage, 
  type = 'pie',
  textinfo = 'percent',
  hoverinfo = 'label+percent+text',
  text = ~paste0("Count: ", Count),
  marker = list(colors = c("lightblue", "lightgreen", "lightcoral", "lightgray")),
  title = "SAT and ACT Participation (all students)",
  width = 500, height = 400
)
SAT_students <- All_Grades_Students %>%
  filter(hssat == "YES" & hsact == "NO")

ACT_students <- All_Grades_Students %>%
  filter(hsact == "YES" & hssat == "NO")

SAT_ACT_students <- All_Grades_Students %>%
  filter(hssat == "YES" & hsact == "YES")

no_SAT_ACT_students <- All_Grades_Students %>%
  filter(hssat == "NO" & hsact == "NO")

num_SAT_students <- nrow(SAT_students)
num_ACT_students <- nrow(ACT_students)
num_SAT_ACT_students <- nrow(SAT_ACT_students)
num_no_SAT_ACT_students <- nrow(no_SAT_ACT_students)


df <- data.frame(
  Category = c("SAT Only", "ACT Only", "Both SAT & ACT", "Neither SAT nor ACT"),
  Count = c(num_SAT_students, num_ACT_students, num_SAT_ACT_students, num_no_SAT_ACT_students)
)


df$Percentage <- round(df$Count / sum(df$Count) * 100, 1)


pie_chart5 <- plot_ly(
  data = df,
  labels = ~Category,
  values = ~Percentage, 
  type = 'pie',
  textinfo = 'percent',
  hoverinfo = 'label+percent+text',
  text = ~paste0("Count: ", Count),
  marker = list(colors = c("lightblue", "lightgreen", "lightcoral", "lightgray")),
  title = "SAT and ACT Participation (subsetted students)",
  width = 500, height = 400
)
# Subset data for STAR_Students
SAT_students <- STAR_Students[STAR_Students$hssat == "YES" & STAR_Students$hsact == "NO", ]
ACT_students <- STAR_Students[STAR_Students$hsact == "YES" & STAR_Students$hssat == "NO", ]
SAT_ACT_students <- STAR_Students[STAR_Students$hssat == "YES" & STAR_Students$hsact == "YES", ]
no_SAT_ACT_students <- STAR_Students[STAR_Students$hssat == "NO" & STAR_Students$hsact == "NO", ]

num_SAT_students <- nrow(SAT_students)
num_ACT_students <- nrow(ACT_students)
num_SAT_ACT_students <- nrow(SAT_ACT_students)
num_no_SAT_ACT_students <- nrow(no_SAT_ACT_students)

# Data for first pie chart
counts1 <- c(num_ACT_students, num_SAT_students, num_SAT_ACT_students, num_no_SAT_ACT_students)
labels1 <- c("ACT Only", "SAT Only", "Both SAT & ACT", "Neither SAT nor ACT")
percentages1 <- round(counts1 / sum(counts1) * 100, 1)
labels1 <- paste(labels1, "\n", percentages1, "%")

# Subset data for All_Grades_Students
SAT_students <- All_Grades_Students[All_Grades_Students$hssat == "YES" & All_Grades_Students$hsact == "NO", ]
ACT_students <- All_Grades_Students[All_Grades_Students$hsact == "YES" & All_Grades_Students$hssat == "NO", ]
SAT_ACT_students <- All_Grades_Students[All_Grades_Students$hssat == "YES" & All_Grades_Students$hsact == "YES", ]

num_SAT_students <- nrow(SAT_students)
num_ACT_students <- nrow(ACT_students)
num_SAT_ACT_students <- nrow(SAT_ACT_students)

# Data for second pie chart
counts2 <- c(num_ACT_students, num_SAT_students, num_SAT_ACT_students)
labels2 <- c("ACT Only", "SAT Only", "Both SAT & ACT")
percentages2 <- round(counts2 / sum(counts2) * 100, 1)
labels2 <- paste(labels2, "\n", percentages2, "%")

# Set graphical parameters for side-by-side plots
par(mfrow = c(1, 2), mar = c(1, 4, 10, 4))

# Pie chart for STAR_Students
pie(counts1, labels = labels1, col = c("lightgreen", "lightblue", "lightcoral", "lightgray"),
    main = "SAT & ACT Participation \n (All Students)")

# Pie chart for All_Grades_Students
pie(counts2, labels = labels2, col = c("lightgreen", "lightblue", "lightcoral"),
    main = "SAT & ACT Participation \n (Subsetted Students)")

# Reset graphical parameters
par(mfrow = c(1, 1))

To help us utilize standardized college entrance exam scores, we need both ACT and SAT scores to be comparable (i.e. on the same scale). The ACT scores can range between 1 to 36, while the SAT scores can range between 400 to 1600 during the time of the study. While the SAT score ranges changed over time, at the time of this report, it has reverted back to the same 400-1600 scale.

Converting the scores to match either exam scale will inevitably lead to some loss of information, but it is necessary for meaningful comparisons. To minimize this loss of information, we will keep the popular ACT composite score, hsactcomp (taken by 91% of subsetted students), as is and utilize the converted SAT composite to ACT score (hsactconverted) for the analysis.

variables_of_interest <- c(
  'stdntid',
  'gkclasstype',
  'g1classtype',
  'g2classtype',
  'g3classtype',
  'cmpstype',
  'cmpsdura',
  'yearsstar',
  'gktreadss',
  'gktmathss',
  'g1treadss',
  'g1tmathss',
  'g2treadss',
  'g2tmathss',
  'g3treadss',
  'g3tmathss',
  'g4treadss',
  'g4tmathss',
  'g5treadss',
  'g5tmathss',
  'g6treadss',
  'g6tmathss',
  'g7treadss',
  'g7tmathss',
  'g8treadss',
  'g8tmathss',
  'hsgpaoverall',
  'hsactcomp',
  'hsactconverted',
  'hsgrdcol'
)

# Keep variables_of_interest from All_Grades_Students

subsetted_data <- All_Grades_Students %>%
  select(all_of(variables_of_interest))

8.2 Supporting Analysis 1: Verifying Short Term Results with Subsetted Data

In this section, we will verify the initial analysis results with the subset of students who have complete data for the longitudinal analysis.

# Table of percentages and counts for class type (`cmpstype`)

class_type_table <- subsetted_data %>%
  group_by(cmpstype) %>%
  summarise(count = n(), .groups = "drop") %>%
  mutate(percentage = sprintf("%.2f%%", count / sum(count) * 100))

# Print using kable
head(class_type_table) %>%
  kable(caption = "Distribution of Class Types in Subsetted Data") %>%
  kable_styling(full_width = FALSE) %>%
  column_spec(1, bold = TRUE)
Distribution of Class Types in Subsetted Data
cmpstype count percentage
SMALL 185 33.88%
REGULAR 66 12.09%
AIDE 255 46.70%
NA 40 7.33%

8.3 Supporting Analysis 2: Impact of Class Size on Grades 4-8 Achievement

8.4 Supporting Analysis 3: Impact of Class Size on High School GPA

# histogram of high school GPA

gpa_hist <- ggplot(subsetted_data, aes(x = hsgpaoverall)) +
                    geom_histogram(binwidth = 0.5, fill = "skyblue", color = "black") +
                    labs(title = "Distribution of High School GPA",
                         x = "High School GPA",
                         y = "Frequency") +
                    theme_minimal()
# Side by side boxplots of high school GPA by class type (`cmpstype`)
gpa_boxplot <- ggplot(subsetted_data, aes(x = cmpstype, y = hsgpaoverall)) +
                        geom_boxplot(fill = "skyblue",
                                      color = "black") +
                        labs(title = "High School GPA by Class Type",
                             x = "Class Type",
                             y = "High School GPA") +
                        theme_minimal()

gpa_hist + gpa_boxplot

8.5 Supporting Analysis 4: Impact of Class Size on College Readiness Exam (ACT/SAT Scores)

We will create a new column college_readiness that keeps the ACT and SAT (converted to ACT scale) scores for students. If a student has taken both exams, we will use the maximum score. This combined score of college readiness exams will serve as a proxy for college readiness.

subsetted_data <- subsetted_data %>%
  mutate(college_readiness = pmax(hsactcomp, hsactconverted, na.rm = TRUE))
# Histogram of college readiness scores

readiness_hist <- ggplot(subsetted_data, aes(x = college_readiness)) +
                          geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
                          labs(title = "Distribution of College Readiness Scores",
                               x = "College Readiness Score",
                               y = "Frequency") +
                          theme_minimal()


# Side by side boxplots of college readiness scores by class type (`cmpstype`)

readiness_boxplot <- ggplot(subsetted_data, aes(x = cmpstype, y = college_readiness)) +
                              geom_boxplot(fill = "skyblue",
                                            color = "black") +
                              labs(title = "College Readiness Scores by Class Type",
                                   x = "Class Type",
                                   y = "College Readiness Score") +
                              theme_minimal()

readiness_hist + readiness_boxplot

8.6 Supporting Analysis 5: Impact of Class Size on High School Graduation

num_of_graduates <- subsetted_data %>%
  filter(hsgrdcol == "YES") %>%
  nrow()

percent_graduates <- sprintf("%.2f%%", num_of_graduates / nrow(subsetted_data) * 100)
  
num_of_non_graduates <- subsetted_data %>%
  filter(hsgrdcol == "NO") %>%
  nrow()

percent_non_graduates <- sprintf("%.2f%%", num_of_non_graduates / nrow(subsetted_data) * 100)


grad_df <- data.frame(
  Category = c("Graduated", "Not Graduated"),
  Count = c(num_of_graduates, num_of_non_graduates),
  Percentage = c(percent_graduates, percent_non_graduates)
)

# Kable

grad_df %>%
  kable(caption = "High School Graduation Status of Subsetted Students") %>%
  kable_styling(full_width = FALSE) %>%
  column_spec(1, bold = TRUE)
High School Graduation Status of Subsetted Students
Category Count Percentage
Graduated 541 99.08%
Not Graduated 5 0.92%
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS 15.3.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] plotly_4.10.4     ggalluvial_0.12.5 tidyr_1.3.1       forcats_1.0.0    
##  [5] foreign_0.8-86    AER_1.2-14        survival_3.6-4    sandwich_3.1-1   
##  [9] lmtest_0.9-40     zoo_1.8-12        broom_1.0.7       MASS_7.3-60.2    
## [13] car_3.1-3         carData_3.0-5     patchwork_1.3.0   kableExtra_1.4.0 
## [17] knitr_1.48        ggthemes_5.1.0    ggplot2_3.5.1     dplyr_1.1.4      
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.5      xfun_0.51         bslib_0.8.0       htmlwidgets_1.6.4
##  [5] lattice_0.22-6    crosstalk_1.2.1   vctrs_0.6.5       tools_4.4.1      
##  [9] generics_0.1.3    tibble_3.2.1      fansi_1.0.6       highr_0.11       
## [13] pkgconfig_2.0.3   Matrix_1.7-0      data.table_1.16.4 lifecycle_1.0.4  
## [17] compiler_4.4.1    farver_2.1.2      stringr_1.5.1     munsell_0.5.1    
## [21] htmltools_0.5.8.1 sass_0.4.9        lazyeval_0.2.2    yaml_2.3.10      
## [25] Formula_1.2-5     pillar_1.9.0      jquerylib_0.1.4   cachem_1.1.0     
## [29] abind_1.4-8       tidyselect_1.2.1  digest_0.6.37     stringi_1.8.4    
## [33] purrr_1.0.2       labeling_0.4.3    splines_4.4.1     fastmap_1.2.0    
## [37] grid_4.4.1        colorspace_2.1-1  cli_3.6.3         magrittr_2.0.3   
## [41] utf8_1.2.4        withr_3.0.1       scales_1.3.0      backports_1.5.0  
## [45] httr_1.4.7        rmarkdown_2.28    evaluate_1.0.0    viridisLite_0.4.2
## [49] rlang_1.1.4       glue_1.8.0        xml2_1.3.6        svglite_2.1.3    
## [53] rstudioapi_0.17.1 jsonlite_1.8.9    R6_2.5.1          systemfonts_1.2.1